Patent Acceptance Predictor & Sentiment Analysis

A HuggingFace-Streamlit application for sentiment analysis and US patent acceptance prediction. Project for NYU CS-GY-6613: Artificial Intelligence - Spring 2023

- Date(s): March 2023 - April 2023
- Platforms: HuggingFace, Streamlit
- Topics: Sentiment Analysis, Patent Acceptance Prediction, Natural Language Processing (NLP), Machine Learning, Python
- Links: Online HuggingFace Space Demo [PRIVATE] Github

As the name implies, our app performs two core functions: Sentiment Analysis and Patent Acceptance Prediction. Initially set as milestones for NYU’s CSGY-6613: Artifiical Intelligence Spring 2023 course, this app is openly available and hosted by HuggingFace.

If you’re interested in NLP, here’s a comprehensive example of just that!

Don’t know if your text might evoke certain emotions? Try out our Sentiment Analysis tool. Determine whether your text appears to be positive, negative, or more by selecting different emotion models in the dropdown option provided. Alternative, are you trying to file a patent? Use our Patent Acceptance Prediction tool to check your changes of being accepted. Just enter your patent’s Abstract and list of Claims and see how your abstract scores.

The USPTO application is divided into several directories. Overall, the important files are present in the application as such:

data/
- train.json
- val.json
src/
- main.py
- train.ipynb
- val.ipynb

Both train.json and val.json contain the original USPTO data, sized down to contain only the relevant data from each recorded patent and split between training and validation data. The validation data val.json is used in the online USPTO application as a set of pre-set patents that a user can select when using the USPTO patent prediction function. That, and the val.ipynb file was used to validate the model’s accuracy.

The primary code back-end is stored in main.py which runs the application on the HuggingFace space UI. The application uses Streamlit to render UI elements on the screen. All models run off of Transformers and Tokenizers from HuggingFace.

The application has two features: Sentiment Analysis (for Milestone #2) and USPTO Patent Acceptance Prediction (Milestone #3). Both run on main.py. Sentiment Analysis relies on pre-trained models from HuggingFace’s public datasets - particularly 4 models:

The Patent Acceptance Prediction uses two fine-tuned models, which are built off of a pre-existing model named distilbert-base-uncased and fine-tuned off of the USPTO dataset. The tokenizer used to parse text uses the same distilbert-base-uncased model but is left unmodified.